Overview

Dataset statistics

Number of variables9
Number of observations4177
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory514.1 KiB
Average record size in memory126.0 B

Variable types

NUM8
CAT1

Reproduction

Analysis started2020-07-19 12:15:11.322285
Analysis finished2020-07-19 12:15:23.555514
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Diameter is highly correlated with Length and 2 other fieldsHigh Correlation
Length is highly correlated with Diameter and 2 other fieldsHigh Correlation
Whole weight is highly correlated with Length and 4 other fieldsHigh Correlation
Shucked weight is highly correlated with Whole weight and 1 other fieldsHigh Correlation
Viscera weight is highly correlated with Length and 3 other fieldsHigh Correlation
Shell weight is highly correlated with Diameter and 2 other fieldsHigh Correlation

Variables

Sex
Categorical

Distinct count3
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size32.8 KiB
M
1528
I
1342
F
1307
ValueCountFrequency (%) 
M 1528 36.6%
 
I 1342 32.1%
 
F 1307 31.3%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 3 100.0%
 
ValueCountFrequency (%) 
Latin 3 100.0%
 
ValueCountFrequency (%) 
ASCII 3 100.0%
 

Length
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count134
Unique (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5239920996
Minimum0.075
Maximum0.815
Zeros0
Zeros (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum0.075
5-th percentile0.295
Q10.45
median0.545
Q30.615
95-th percentile0.69
Maximum0.815
Range0.74
Interquartile range (IQR)0.165

Descriptive statistics

Standard deviation0.1200929126
Coefficient of variation (CV)0.2291884031
Kurtosis0.06462097389
Mean0.5239920996
Median Absolute Deviation (MAD)0.09667826577
Skewness-0.639873269
Sum2188.715
Variance0.01442230765
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.075 0.1525 0.2325 0.2725 0.3475 ... 0.6525 0.6725 0.7275 0.7525 0.815 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.625 94 2.3%
 
0.55 94 2.3%
 
0.575 93 2.2%
 
0.58 92 2.2%
 
0.62 87 2.1%
 
0.6 87 2.1%
 
0.5 81 1.9%
 
0.57 79 1.9%
 
0.63 78 1.9%
 
0.61 75 1.8%
 
Other values (124) 3317 79.4%
 
ValueCountFrequency (%) 
0.075 1 < 0.1%
 
0.11 1 < 0.1%
 
0.13 2 < 0.1%
 
0.135 1 < 0.1%
 
0.14 2 < 0.1%
 
ValueCountFrequency (%) 
0.815 1 < 0.1%
 
0.8 1 < 0.1%
 
0.78 2 < 0.1%
 
0.775 2 < 0.1%
 
0.77 3 0.1%
 

Diameter
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count111
Unique (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4078812545
Minimum0.055
Maximum0.65
Zeros0
Zeros (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum0.055
5-th percentile0.22
Q10.35
median0.425
Q30.48
95-th percentile0.545
Maximum0.65
Range0.595
Interquartile range (IQR)0.13

Descriptive statistics

Standard deviation0.09923986613
Coefficient of variation (CV)0.2433057784
Kurtosis-0.04547558144
Mean0.4078812545
Median Absolute Deviation (MAD)0.08029579943
Skewness-0.6091981423
Sum1703.72
Variance0.00984855103
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.055 0.0975 0.1675 0.1925 0.2475 ... 0.5275 0.5525 0.5775 0.6025 0.65 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.45 139 3.3%
 
0.475 120 2.9%
 
0.4 111 2.7%
 
0.5 110 2.6%
 
0.47 100 2.4%
 
0.48 91 2.2%
 
0.455 90 2.2%
 
0.46 89 2.1%
 
0.44 87 2.1%
 
0.485 83 2.0%
 
Other values (101) 3157 75.6%
 
ValueCountFrequency (%) 
0.055 1 < 0.1%
 
0.09 1 < 0.1%
 
0.095 1 < 0.1%
 
0.1 2 < 0.1%
 
0.105 4 0.1%
 
ValueCountFrequency (%) 
0.65 1 < 0.1%
 
0.63 3 0.1%
 
0.625 1 < 0.1%
 
0.62 1 < 0.1%
 
0.615 1 < 0.1%
 

Height
Real number (ℝ≥0)

Distinct count51
Unique (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1395163993
Minimum0
Maximum1.13
Zeros2
Zeros (%)< 0.1%
Memory size32.8 KiB

Quantile statistics

Minimum0
5-th percentile0.075
Q10.115
median0.14
Q30.165
95-th percentile0.2
Maximum1.13
Range1.13
Interquartile range (IQR)0.05

Descriptive statistics

Standard deviation0.04182705661
Coefficient of variation (CV)0.2998002873
Kurtosis76.02550923
Mean0.1395163993
Median Absolute Deviation (MAD)0.03128485168
Skewness3.128817379
Sum582.76
Variance0.001749502664
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.0225 0.0475 0.0625 0.0725 ... 0.2025 0.2175 0.2325 0.245 1.13 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.15 267 6.4%
 
0.14 220 5.3%
 
0.155 217 5.2%
 
0.175 211 5.1%
 
0.16 205 4.9%
 
0.125 202 4.8%
 
0.165 193 4.6%
 
0.135 189 4.5%
 
0.145 182 4.4%
 
0.13 169 4.0%
 
Other values (41) 2122 50.8%
 
ValueCountFrequency (%) 
0 2 < 0.1%
 
0.01 1 < 0.1%
 
0.015 2 < 0.1%
 
0.02 2 < 0.1%
 
0.025 5 0.1%
 
ValueCountFrequency (%) 
1.13 1 < 0.1%
 
0.515 1 < 0.1%
 
0.25 3 0.1%
 
0.24 4 0.1%
 
0.235 6 0.1%
 

Whole weight
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count2429
Unique (%)58.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8287421594
Minimum0.002
Maximum2.8255
Zeros0
Zeros (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum0.002
5-th percentile0.1259
Q10.4415
median0.7995
Q31.153
95-th percentile1.6949
Maximum2.8255
Range2.8235
Interquartile range (IQR)0.7115

Descriptive statistics

Standard deviation0.4903890182
Coefficient of variation (CV)0.5917268871
Kurtosis-0.02364350427
Mean0.8287421594
Median Absolute Deviation (MAD)0.400454117
Skewness0.5309585633
Sum3461.656
Variance0.2404813892
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[2.00000e-03 1.35000e-02 1.69250e-01 1.14625e+00 1.38525e+00 1.56825e+00 1.80900e+00 2.27125e+00 2.82550e+00], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.2225 8 0.2%
 
0.196 7 0.2%
 
0.97 7 0.2%
 
0.4775 7 0.2%
 
1.1345 7 0.2%
 
0.6765 6 0.1%
 
0.18 6 0.1%
 
0.3245 6 0.1%
 
0.5805 6 0.1%
 
0.494 6 0.1%
 
Other values (2419) 4111 98.4%
 
ValueCountFrequency (%) 
0.002 1 < 0.1%
 
0.008 1 < 0.1%
 
0.0105 1 < 0.1%
 
0.013 1 < 0.1%
 
0.014 1 < 0.1%
 
ValueCountFrequency (%) 
2.8255 1 < 0.1%
 
2.7795 1 < 0.1%
 
2.657 1 < 0.1%
 
2.555 1 < 0.1%
 
2.55 1 < 0.1%
 

Shucked weight
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count1515
Unique (%)36.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3593674886
Minimum0.001
Maximum1.488
Zeros0
Zeros (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum0.001
5-th percentile0.0524
Q10.186
median0.336
Q30.502
95-th percentile0.7402
Maximum1.488
Range1.487
Interquartile range (IQR)0.316

Descriptive statistics

Standard deviation0.221962949
Coefficient of variation (CV)0.6176489417
Kurtosis0.5951236784
Mean0.3593674886
Median Absolute Deviation (MAD)0.1794554199
Skewness0.7190979218
Sum1501.078
Variance0.04926755074
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.00000e-03 3.37500e-02 5.00750e-01 6.06750e-01 6.96250e-01 7.78750e-01 9.53000e-01 1.24925e+00 1.48800e+00], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.175 11 0.3%
 
0.2505 10 0.2%
 
0.165 9 0.2%
 
0.096 9 0.2%
 
0.2025 9 0.2%
 
0.419 9 0.2%
 
0.302 9 0.2%
 
0.2945 9 0.2%
 
0.2 9 0.2%
 
0.21 9 0.2%
 
Other values (1505) 4084 97.8%
 
ValueCountFrequency (%) 
0.001 1 < 0.1%
 
0.0025 1 < 0.1%
 
0.0045 2 < 0.1%
 
0.005 3 0.1%
 
0.0055 2 < 0.1%
 
ValueCountFrequency (%) 
1.488 1 < 0.1%
 
1.351 1 < 0.1%
 
1.3485 1 < 0.1%
 
1.253 1 < 0.1%
 
1.2455 1 < 0.1%
 

Viscera weight
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count880
Unique (%)21.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1805936079
Minimum0.0005
Maximum0.76
Zeros0
Zeros (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum0.0005
5-th percentile0.027
Q10.0935
median0.171
Q30.253
95-th percentile0.3796
Maximum0.76
Range0.7595
Interquartile range (IQR)0.1595

Descriptive statistics

Standard deviation0.1096142503
Coefficient of variation (CV)0.6069663902
Kurtosis0.084011749
Mean0.1805936079
Median Absolute Deviation (MAD)0.08925215223
Skewness0.5918521514
Sum754.3395
Variance0.01201528386
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[5.0000e-04 4.2500e-03 3.3250e-02 2.4775e-01 3.1850e-01 3.8750e-01 4.2675e-01 5.2625e-01 7.6000e-01], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.1715 15 0.4%
 
0.196 14 0.3%
 
0.061 13 0.3%
 
0.037 13 0.3%
 
0.2195 13 0.3%
 
0.0575 13 0.3%
 
0.159 12 0.3%
 
0.1905 12 0.3%
 
0.0265 12 0.3%
 
0.1625 12 0.3%
 
Other values (870) 4048 96.9%
 
ValueCountFrequency (%) 
0.0005 2 < 0.1%
 
0.002 1 < 0.1%
 
0.0025 2 < 0.1%
 
0.003 3 0.1%
 
0.0035 3 0.1%
 
ValueCountFrequency (%) 
0.76 1 < 0.1%
 
0.6415 1 < 0.1%
 
0.59 1 < 0.1%
 
0.575 1 < 0.1%
 
0.5745 1 < 0.1%
 

Shell weight
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count926
Unique (%)22.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2388308595
Minimum0.0015
Maximum1.005
Zeros0
Zeros (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum0.0015
5-th percentile0.0384
Q10.13
median0.234
Q30.329
95-th percentile0.48
Maximum1.005
Range1.0035
Interquartile range (IQR)0.199

Descriptive statistics

Standard deviation0.1392026695
Coefficient of variation (CV)0.5828504316
Kurtosis0.5319261262
Mean0.2388308595
Median Absolute Deviation (MAD)0.1124146562
Skewness0.6209268251
Sum997.5965
Variance0.0193773832
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.0015 0.0295 0.03025 0.03475 0.03525 ... 0.4905 0.531 0.6275 0.7255 1.005 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.275 43 1.0%
 
0.25 42 1.0%
 
0.265 40 1.0%
 
0.315 40 1.0%
 
0.185 40 1.0%
 
0.285 37 0.9%
 
0.17 37 0.9%
 
0.175 36 0.9%
 
0.3 36 0.9%
 
0.22 36 0.9%
 
Other values (916) 3790 90.7%
 
ValueCountFrequency (%) 
0.0015 1 < 0.1%
 
0.003 1 < 0.1%
 
0.0035 1 < 0.1%
 
0.004 2 < 0.1%
 
0.005 12 0.3%
 
ValueCountFrequency (%) 
1.005 1 < 0.1%
 
0.897 1 < 0.1%
 
0.885 2 < 0.1%
 
0.85 1 < 0.1%
 
0.815 1 < 0.1%
 

Rings
Real number (ℝ≥0)

Distinct count28
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.933684463
Minimum1
Maximum29
Zeros0
Zeros (%)0.0%
Memory size32.8 KiB

Quantile statistics

Minimum1
5-th percentile6
Q18
median9
Q311
95-th percentile16
Maximum29
Range28
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.224169032
Coefficient of variation (CV)0.324569302
Kurtosis2.330687427
Mean9.933684463
Median Absolute Deviation (MAD)2.362462357
Skewness1.114101898
Sum41493
Variance10.39526595
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 2.5 3.5 4.5 5.5 ... 15.5 17.5 20.5 23.5 29. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
9 689 16.5%
 
10 634 15.2%
 
8 568 13.6%
 
11 487 11.7%
 
7 391 9.4%
 
12 267 6.4%
 
6 259 6.2%
 
13 203 4.9%
 
14 126 3.0%
 
5 115 2.8%
 
Other values (18) 438 10.5%
 
ValueCountFrequency (%) 
1 1 < 0.1%
 
2 1 < 0.1%
 
3 15 0.4%
 
4 57 1.4%
 
5 115 2.8%
 
ValueCountFrequency (%) 
29 1 < 0.1%
 
27 2 < 0.1%
 
26 1 < 0.1%
 
25 1 < 0.1%
 
24 2 < 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

SexLengthDiameterHeightWhole weightShucked weightViscera weightShell weightRings
0M0.4550.3650.0950.51400.22450.10100.15015
1M0.3500.2650.0900.22550.09950.04850.0707
2F0.5300.4200.1350.67700.25650.14150.2109
3M0.4400.3650.1250.51600.21550.11400.15510
4I0.3300.2550.0800.20500.08950.03950.0557
5I0.4250.3000.0950.35150.14100.07750.1208
6F0.5300.4150.1500.77750.23700.14150.33020
7F0.5450.4250.1250.76800.29400.14950.26016
8M0.4750.3700.1250.50950.21650.11250.1659
9F0.5500.4400.1500.89450.31450.15100.32019

Last rows

SexLengthDiameterHeightWhole weightShucked weightViscera weightShell weightRings
4167M0.5000.3800.1250.57700.26900.12650.15359
4168F0.5150.4000.1250.61500.28650.12300.17658
4169M0.5200.3850.1650.79100.37500.18000.181510
4170M0.5500.4300.1300.83950.31550.19550.240510
4171M0.5600.4300.1550.86750.40000.17200.22908
4172F0.5650.4500.1650.88700.37000.23900.249011
4173M0.5900.4400.1350.96600.43900.21450.260510
4174M0.6000.4750.2051.17600.52550.28750.30809
4175F0.6250.4850.1501.09450.53100.26100.296010
4176M0.7100.5550.1951.94850.94550.37650.495012